Back

The Plant Phenome Journal

Wiley

Preprints posted in the last 30 days, ranked by how well they match The Plant Phenome Journal's content profile, based on 14 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Rapid, Non-Destructive Visualization of α-Zein Expression and Grain Protein Concentration in Maize Using the Floury2-RFP Reporter Transgene

Li, C.; Heller, N. J.; Tiskevich, C. J.; Moose, S. P.

2026-05-07 plant biology 10.64898/2026.05.05.723001 medRxiv
Top 0.1%
14.4%
Show abstract

Kernel composition traits in maize, including protein accumulation, are of broad interest. The amount of the most abundant proteins in maize endosperm, the -zeins, can vary dramatically among genotypes and in response to soil nitrogen supply. Targeted reductions in -zein accumulation can improve nitrogen utilization and the nutritional quality of maize grain but have traditionally required expensive and destructive phenotyping methods. The Floury2-RFP (Fl2-RFP) reporter gene enables rapid, non-destructive visualization of -zein accumulation in individual maize kernels under white light. This feature is due to the high expression level programmed by the Fl2 promoter, the stability of zein proteins, and the use of monomeric RFP, which emits fluorescence without the need for multimerization. This study aimed to develop a method to quickly document and quantify Fl2-RFP accumulation using camera or smartphone images of either ears or shelled kernels. Results show images of shelled kernels processed with FIJI software capture the Fl2-RFP reporter phenotype better than images of ears. Fl2-RFP confirms the strong maternal control of -zein accumulation and, like grain protein concentration, responds to soil nitrogen supply. The Fl2-RFP phenotyping pipeline effectively quantified Fl2-RFP accumulation by color features from both camera and smartphone images. Smartphone imaging of Fl2-RFP in a diverse population of inbreds followed by elastic net regression of extracted image features predicted kernel protein concentration, as measured by near-infrared spectroscopy, with moderate accuracy (R2 = 0.68, MAE = 0.76, RMSE = 0.93). The spectral features that were most predictive of kernel protein concentration varied depending on whether the background endosperm color was white or yellow. The integrated analysis of Fl2-RFP intensity and grain protein concentration indicates genetic variation for kernel protein accumulation and N-responsiveness that is distinct from the well-studied -zeins. Our findings highlight the Fl2-RFP reporter gene as a valuable tool for investigating the genetic complexity of grain protein concentration and associated traits in maize.

2
Progeny differentiation in faba bean using hyperspectral images and machine learning

Schlichtermann, R.-H.; Warnemuende, S.; Tietgen, H.; Welna, G.; Stahl, A.; Wittkop, B.; Snowdon, R.

2026-05-21 genetics 10.64898/2026.05.19.725957 medRxiv
Top 0.1%
6.3%
Show abstract

Though currently a minor crop, faba bean is a promising source of plant-based protein as global diets shift towards more plant-based nutrition. To realise this potential, advances in breeding and cultivation are crucial. To exploit heterosis, faba bean breeding frequently utilises synthetic cultivars, which involves open pollination of inbred lines to produce a mixture of F1 hybrid seeds and self-pollinated offspring. Pure F1 hybrid cultivars are currently unavailable due to unstable cytoplasmic male sterility (CMS) systems. An ability to distinguish F1 seeds from their parental inbreds via characteristics associated with xenia effects could change this. The xenia effect refers to the influence of paternal pollen on seed traits, for example seed weight and cotyledon cells in faba bean. In this study, we exploited the xenia effect captured in hyperspectral imaging data to develop machine learning scenarios for discriminating between parental and F1 seeds of open pollinated synthetic combinations (Syn-1). The hyperspectral data were pre-processed using Savitzky-Golay filtering to reduce noise and smooth the spectra. Various machine learning algorithms were applied, incorporating Bayesian hyperparameter optimisation. The scenarios achieved up to 98.9 % accuracy in separating parental components of Syn-1. When including all seeds, the model achieved 40.7 %, indicating moderate detection and classification performance. As the harmonic mean of precision and recall, the F1 score accounts for both the correctness of F1 seed detections and the completeness with which F1 seeds were detected. While this approach does not yet enable the development of full hybrid cultivars, it paves the way for hybrid-enriched cultivars. These could help to streamline breeding for synthetic cultivars and potentially increase yields, for example by increasing the proportion of F1 hybrid seeds in synthetic cultivars. This study extends knowledge of the xenia effect in faba bean and provides a basis for further research aimed at enhancing breeding methods and productivity.

3
LIME: a fully automated pipeline for high-throughput quantification of leaf lesions

Tan, D.

2026-05-10 plant biology 10.64898/2026.05.07.723432 medRxiv
Top 0.1%
6.2%
Show abstract

Accurate quantification of leaf lesion severity is essential for plant disease research and phenotyping but is often limited by subjective visual scoring and time-intensive manual image analysis. We present LIME, a fully automated, open-source image analysis pipeline for high-throughput quantification of leaf lesions from disease assay images. LIME integrates zero-shot leaf segmentation using the Segment Anything Model with a convolutional neural network for lesion area estimation. Applied to Arabidopsis thaliana leaves infected with Sclerotinia sclerotiorum, the proposed approach achieved a mean absolute percentage error of 12.9%, comparable to observed intrarater variability in manual scoring. Stratified evaluation across lesion-size groups demonstrated consistent prediction accuracy for small, intermediate, and large lesions, and comparative analysis showed that the deep learning-based model substantially outperformed color-based baseline methods. Under GPU-accelerated execution, LIME processed complete assays containing approximately 200 leaves in 15 minutes, representing an approximate 13-fold reduction in processing time relative to manual annotation. Together, these results indicate that LIME enables objective, reproducible, and scalable quantification of leaf lesion severity in standardized plant pathology assays. The pipeline is released as an open-source tool to support quantitative phenotyping studies.

4
Identifying water stress response haplotypes in barley using latent environmental covariates

Aldiss, Z.; Brunner, S.; Heidariask, B.; Chenu, K.; Van Haeften, S.; Baraibar, S.; Ganesgalingam, D.; Moody, D.; Hickey, L.; Lam, Y.

2026-05-07 plant biology 10.64898/2026.05.04.722807 medRxiv
Top 0.1%
4.7%
Show abstract

PurposeGenotype-by-environment (G x E) interactions represent a major obstacle to increasing genetic gain in crop breeding, with the underlying physiological drivers often remaining obscured within conventional statistical models. This case study presents a novel framework that transforms the latent factors from Factor Analytic (FA) multi-environment trial (MET) models into heritable quantitative traits, enabling the genetic dissection of adaptive response patterns. MethodsA Factor Analytical Linear Mixed Model (FA-LMM) was fit to plot-level yield data for 1,036 barley genotypes across eight Australian trials. ResultsCorrelation of the factor loadings with APSIM-simulated environmental covariates demonstrated that the second latent factor FA2 was strongly correlated with the Water Stress Index (r = -0.83) during the critical flowering period, establishing water availability as the main biological axis of crossover Gx E. Genotypic scores for the derived traits, Overall Performance (OP) and Water Stress Response (WSR), were subjected to high-resolution haplotype-based mapping using local Genomic Estimated Breeding Values (GEBV). ConclusionThis analysis successfully identified major genomic regions that accounted for a substantial proportion of the additive genetic variance. Gene Ontology enrichment of candidate genes within the top haploblocks implicated fundamental pathways related to energy homeostasis, root development, and stress response, with notable candidates including FTsH11, BPS1, and TDP1. The distribution of favourable Haplotypes of Interest (HOI) in elite cultivars suggested a historical signature of inadvertent selection for these adaptive mechanisms. This framework provides an explicit bridge between statistical modelling and functional genomics, offering breeders actionable genetic targets for accelerated development of climate-resilient cereals.

5
Easy to use and low cost leaf disease quantification workflow using Ilastik

Prouvost, A.; Connesson, L.; Le Gourrierec, T.; Freville, H.; David, J.; Plessis, C.; Magnier, B.

2026-05-16 plant biology 10.64898/2026.05.14.719059 medRxiv
Top 0.1%
3.9%
Show abstract

Accurate and reproducible assessment of foliar disease severity is essential for evaluating the performance of heterogeneous plant communities and understanding host-pathogen interactions. However, traditional visual scoring methods remain subjective, with limited precision, and difficult to scale in large phenotyping experiments. Here, we present a semi-automated image analysis workflow designed to quantify multiple foliar disease symptoms simultaneously on wheat flag leaves sampled from varietal mixtures. The workflow combines three methodological components: (i) a standardized protocol for leaf sampling and imaging, (ii) supervised machine learning segmentation using Random Forest implemented in Ilastik to classify multiple symptoms (powdery mildew and yellow rust), and (iii) a graphical user interface facilitating pipeline deployment by non-specialist operators. To evaluate the influence of image representation on classification performance, four color spaces (RGB, HSV, HLS, LAB) were systematically compared. The approach was validated using images of durum wheat flag leaves collected from a field experiment assessing eight-way varietal mixtures under natural fungal pressure. Cross-validation against manually annotated images demonstrated high segmentation accuracy across all symptom. Comparison among color spaces revealed only minor differences in performance. Overall, this workflow offers a cost-effective, annotation-efficient and reproducible alternative to deep learning approaches, leveraging open-source and actively maintained tools while requiring limited training data and enabling objective, reproducible and scalable disease phenotyping.

6
Reaction Norm Modeling of High-Dimensional Genomic and Environmental Data Improves Prediction Accuracy in Winter Wheat

Acharya, S. R.; Garcia-Abadillo, J.; Lyerly, J.; Brown-Guedira, G.; Jarquin, D.; Bandillo, N.

2026-05-08 genetics 10.64898/2026.05.05.722758 medRxiv
Top 0.1%
3.9%
Show abstract

Genomic prediction models that account genotype-by-environment (GxE) have the potential to accelerate the rate of genetic gain for yield and agronomic performance, yet relatively few studies have applied GxE prediction in public soft red winter wheat (Triticum aestivum) breeding programs. In this study, we extended a reaction norm-based genomic prediction framework by integrating weather-based environmental covariates to more effectively capture genotype- environment interactions. Key agronomic traits, including seed yield, plant height, test weight, and heading date, were evaluated across 33 environments (location-year) using over 3,200 breeding lines from the North Carolina State University small grains breeding program. Multiple genomic prediction models were compared using several cross-validation (CV) schemes representing common breeding scenarios. Across traits, the reaction norm M5 model, which incorporates both GxE and genotype-by-environmental covariate interactions (GxO), achieved the highest prediction accuracy (PA) in CV2 (predicting incomplete field trials) and CV1 for yield and test weight (predicting new lines). The highest PA was observed for test weight under CV2 (0.54) and for yield under CV1 (0.41). Under CV0 (predicting new environments), the M3 model incorporating GxE produced highest PA across traits, with the greatest accuracy for plant height (0.45), although differences among M2, M3, and M4 were small. Prediction under CV00 (predicting new lines in new environments) remained more challenging, with PA values 0.10 - 0.20 across traits. Overall, our results demonstrate that integrating environmental covariates into genomic prediction models can improve predictive performance across diverse wheat-growing environments in North Carolina, supporting their utility for applied breeding efforts. CORE IDEASO_LIIntegrating genotype-by-environment (GxE) interactions with environmental covariates improves prediction accuracy across environments. C_LIO_LIModel performance varies by prediction scenario, with different approaches performing best for new lines, incomplete trials, or new environments. C_LIO_LIPrediction of new lines in new environments remains challenging. C_LI PLAIN LANGUAGE SUMMARYThis study explores how adding environmental information to genomic prediction models can improve prediction accuracy in a public winter wheat breeding program. Using data from multi-environment trials conducted across diverse conditions in North Carolina, we evaluated statistical models that capture how different wheat lines respond to changing environments. By incorporating weather data, we improved the ability to predict performance across locations and years. These findings provide practical insights for refining selection strategies and accelerating genetic gain in wheat breeding.

7
Novel linkage disequilibrium-based genotype-by-environmental interaction method for genomic prediction of cotton yield and fibre quality traits

Li, Z.; Li, X.; Liu, S.; Wilson, I.; Zhu, Q.-H.; Stiller, W.; Conaty, W.

2026-05-06 plant biology 10.64898/2026.05.03.722538 medRxiv
Top 0.1%
3.6%
Show abstract

Genomic prediction (GP) across diverse environments has a potential to accelerate genetic gain in cotton breeding programs. A major challenge in GP is modelling genotype-by-environment interactions (GEI), which is essential for selecting stable and high-performing genotypes under variable production conditions. However, incorporating GEI into GP models increases the dimensionality and computational complexity, risking complex models that are impractical to use on commercial breeding-scale data sets because of run times and computational demands. This study addresses two primary aims. Firstly, we evaluate the practical benefits of GEI-informed GP for predicting economically important cotton traits. Second, advanced statistical modelling strategies are developed and assessed for integrating genomic and environmental data at scale. We propose a dimensionality reduction approach that combines linkage disequilibrium network analysis with principal component techniques to reduce redundancy while preserving informative variation. Using this reduced dataset, we implement Bayesian linear regression models and, for comparison, deep residual neural networks for genomic prediction. Analyses were conducted on a large multi-environment dataset from the CSIRO cotton breeding program, comprising 3,236 breeding lines, 54 environmental covariates, and 8,049 yield and fibre quality phenotype records collected over 10 years and 9 locations representing 41 year-location combinations. Results demonstrate that generally Bayesian linear regression approaches outperform BG-BLUP models, with all three linear/linear mixed methods providing clearly more reliable performance than the deep learning models. These findings highlight the value of using interpretable statistical models for integrating genomic and environmental information to support selection decisions under diverse environmental conditions.

8
LeafyVGG-16: Transfer Learning for Plant Disease Detection with Cyber Risk Analysis

Chiwele, N.; Sweeney, E.; Hossain, K.

2026-05-18 plant biology 10.64898/2026.05.13.724946 medRxiv
Top 0.1%
2.7%
Show abstract

Plant disease detection using deep learning is essential for precision agriculture, enabling early and automated crop health monitoring. This study proposes an end-to-end transfer learning pipeline, LeafyVGG-16, for multi-class classification of plant diseases and nutrient deficiencies using a tomato leaf dataset. The framework integrates data preprocessing, augmentation, and a VGG-16 backbone with a two-stage fine-tuning strategy. The proposed model is evaluated against CNN, DenseNet-121, Inception-V3, EfficientNetB0, and ResNet-50, achieving an accuracy of 0.93 with precision, recall, and F1-scores of 0.93, 0.90, and 0.92, respectively. These results demonstrate the effectiveness of transfer learning for fine-grained plant disease recognition. We further evaluate model robustness under adversarial cyber attacks to assess deployment reliability in agricultural systems. Under Fast Gradient Sign Method (FGSM) attacks ({epsilon} = 0.01- 0.05), the model shows an accuracy drop of 1%-7.5%, while Projected Gradient Descent (PGD) attacks ({epsilon} = 0.05, step size = 0.005, 10 iterations) produce similar degradation, highlighting the models vulnerability to adversarial perturbations. These findings highlight potential security and reliability risks in AI-based agricultural decision-making systems. Future work will focus on improving robustness and cyber-resilience and extending this framework to other crops for secure and context-aware deployment in resource-constrained environments.

9
Effects of leaf removal on photosynthetic activity, fruit yield, and quality of micro-dwarf tomatoes

Usenko, D.; Giladi, C.; Ziv, C.; Helman, D.

2026-05-13 plant biology 10.64898/2026.05.10.724098 medRxiv
Top 0.1%
2.3%
Show abstract

Micro-dwarf tomato cultivars are increasingly considered for urban and controlled-environment agriculture due to their compact architecture and suitability for high-density planting. However, optimal canopy management strategies for these cultivars remain poorly defined. In this study, we evaluated the effects of different leaf removal intensities on leaf-level physiological performance, fruit yield, and fruit quality in three micro-dwarf tomato cultivars (Mohammed, Hahms Gelbe Topftomate, and Red Robin) grown under contrasting seasonal light conditions. Plants were subjected to low (15%), moderate (30%), or severe (90%) leaf removal, and leaf-level gas exchange was measured across canopy layers, along with yield and fruit quality assessments. Severe leaf removal (90%) increased carbon assimilation, transpiration, and stomatal conductance in middle and lower canopy leaves by up to approximately twofold compared with control plants, indicating improved light availability at the leaf level. However, these physiological enhancements did not consistently translate into higher yield, reflecting reduced whole-plant source capacity under excessive leaf removal. Low to moderate leaf removal (15-30%) generally increased or maintained yield and fruit number, whereas severe leaf removal reduced yield in Hahms Gelbe and Red Robin, particularly under low seasonal radiation. In contrast, Mohammed exhibited yield increases of up to 220% under low leaf removal and maintained increased yield even under severe leaf removal under high-light conditions. Fruit quality was largely unaffected by leaf removal, except for total soluble solids, which declined by approximately 12% under severe leaf removal across cultivars, consistent with sugar dilution under source limitation. Overall, these results demonstrate that optimal leaf removal in micro-dwarf tomatoes requires balancing improved canopy light distribution with maintenance of sufficient leaf area for carbon assimilation. Leaf removal thresholds are strongly cultivar- and light-dependent, emphasizing the need for cultivar-specific canopy management strategies in compact tomato systems and controlled-environment agriculture.

10
LOCOPOTS: a low-cost high-throughput screening platform for in vitro potato phenotyping under abiotic stress

Saiz-Fernandez, I.; Bastidas Parrado, L. A.; Klimes, P.; Cavar Zeljkovic, S.; Ruiz de Galarreta, J. I.; Leyva-Perez, M. d. l. O.; Ortiz-Barredo, A.; Spichal, L.; De Diego, N.

2026-05-14 plant biology 10.64898/2026.05.12.724622 medRxiv
Top 0.1%
2.0%
Show abstract

Potato crop is highly vulnerable to abiotic stresses like salinity and low nutrient availability. Rapid identification of stress-resilient genotypes is therefore essential for breeding, yet conventional phenotyping is often slow, space-demanding and expensive. We present LOCOPOTS -- a LOw-COst high-throughput screening platform for in vitro POTatoes under abiotic Stress -- which combines individual in vitro plant culture, low-cost RGB imaging and machine-learning-based automatic segmentation using a trained model of a convolutional neural network, based on U-Net architecture. LOCOPOTS enabled the automated extraction of growth, colour, and vegetation-index traits and demonstrated robust performance across independent phenotyping rounds. We screened 30 potato varieties under control, low-nutrient and saltinity conditions, identifying contrasting growth and physiological responses. Integrated traits such as final area and height, Area_AUC and height_AUC, together with GLI, Chol, cive and chlorophyll fluorescence parameters, discriminated genotype performance under stress. Metabolic profiling further revealed genotype-specific reprogramming in carbon and nitrogen metabolism under low nutrition and salt stress, including changes in fructose, myo-inositol, {beta}-aminobutyric acid, {gamma}-aminobutyric acid, proline, and certain polyamines, identifying them as specific chemical biomarkers of plant stress responses. LOCOPOTS provides a scalable, affordable and space-efficient platform for early screening of potato genetic diversity and identification of candidate traits associated with stress resilience.

11
Crop yields under simulated nuclear winter: a growth chamber experiment

Blouin, S.; Abrams, D. R.; Ben-Zeev, R.; Anderson, C. T.; Lasky, J. R.; Denkenberger, D.

2026-05-07 plant biology 10.64898/2026.05.05.723012 medRxiv
Top 0.1%
1.9%
Show abstract

A global nuclear war could inject soot into the stratosphere, blocking sunlight and causing rapid cooling. Assessments of the resulting agricultural collapse rely on crop models never validated under such conditions. We grew wheat, canola, and potato in growth chambers simulating the light and temperature of an extreme nuclear winter at tropical and temperate sites. In the tropical chamber (18-20 {degrees}C, 200 mol m-2 s-1 PAR), all three crops produced viable yields. Wheat yielded 2.1-2.3 t/ha (n=3 well-watered, n=3 water-stressed pots), 60% of the global average, and single-pot observations of canola and potato suggested biological yields comparable to global averages. In the temperate chamber simulating nuclear winter irradiance (60-360 mol m-2 s-1), wheat stems collapsed under their own weight. Although hand-harvesting recovered 0.6-2.8 t/ha of grain, mechanical field harvest of a flat canopy would recover substantially less. This failure mode was not observed in a higher-light control chamber and is not captured by existing crop models, which may therefore overestimate temperate cereal production under nuclear winter. Canola produced comparable yields under both temperate light regimes without lodging. Empirical screening of additional staples is needed to identify which remain viable under nuclear winter.

12
Dim Green Light Enables Day-and-Night Monitoring of Leaf Movements

Herrero, E.; Gill, A. R.; Wijeweera, S.; Ginzburg, D.; Stamford, J. D.; Antoniades, A.; Bromley, J. R.; Mortimer, J.; Gilliham, M.; Millar, H.; Webb, A. A.

2026-05-09 plant biology 10.64898/2026.05.08.723725 medRxiv
Top 0.1%
1.8%
Show abstract

Understanding plant growth dynamics requires imaging across day-and-night cycles to quantify growth, movement and development in the aerial plant body and to capture the rhythmic nature of these processes. This requires imaging in light during the day and in darkness at night without perturbing plant physiology. Nighttime imaging has typically depended on infrared (IR) illumination, producing monochrome datasets that require specialised hardware and separate analysis pipelines when combined with daytime RGB imaging. Here, we evaluated very low-intensity green (dimG) illumination from standard LEDs as a practical alternative for colour-consistent nighttime imaging and assessed its physiological impact in Arabidopsis thaliana and Lactuca sativa (lettuce). We show that high resolution colour images can be obtained under dimG using low- cost cameras, with sufficient consistency between full-spectrum and dimG images to allow direct comparison and unified image analysis. We show that very low-fluence green light (<0.5 mol m-2 s-1) does not sustain circadian oscillations of gene activity under continuous exposure and does not perturb rhythms when applied during the dark phase of diel cycles. DimG imaging enabled accurate detection of diel leaf movement profiles in Arabidopsis circadian mutants, revealing genotype-specific phase differences under varying photoperiods. In lettuce, dimG pulses and continuous dimG enabled accurate quantification of diel leaf movement without affecting growth, stomatal opening, electron transport rate or chlorophyll content. Motion profiles under continuous dimG mirrored those under darkness. Our findings establish dim green illumination as a cost-effective solution for night-time imaging, simplifying phenotyping workflows with minimal impact on physiology.

13
Increasing Phenomic Prediction Efficiency Using A Principal Component Analysis Based Pre-Processing Of Near Infrared Spectra

Bienvenu, C.; Roger, J.-M.; Sene, M.; Castro Pacheco, S. A.; Singer, M.; Felaniaina, B. L.; Terrier, N.; De Bellis, F.; Pot, D.; DE VERDAL, H.; Segura, V.

2026-05-13 genetics 10.64898/2026.05.10.724118 medRxiv
Top 0.1%
1.7%
Show abstract

Phenomic prediction (PP) is a breeding value prediction method using near infrared spectroscopy (NIRS). Spectra pre-processing is a key step in the analysis pipeline of PP and generally involves chemometrics methods. However, there is still little understanding in the genetics community of what pre-processing does and why it increases performances. Consequently, the choice of pre-processing is done either arbitrarily or through a search of the optimal set of methods and associated parameters. In this study, we propose a PCA-based pre-processing method where genetic values of spectra are estimated on a set of principal components instead of individual wavelengths. This way, estimations are based on a few informative and orthogonal features of spectra instead of many correlated, uninformative wavelengths. We tested this new pre-processing method on five data sets representing four plant species (maize, rice, sorghum and grapevine). Results show that it performs as good, or better than the best classical chemometric pre-processing methods in almost all cases. Combining PCA-based and classical chemometric pre-processing methods maximizes predictive ability. Moreover, this pre-processing method opens up possibilities of better understanding and selecting parts of the spectral information that are relevant for the prediction of breeding values. Indeed, components representing together about 1% of spectral variability were found to be responsible for most of PP predictive ability. Plain language summaryCultivated plants are the result of a breeding process during which their genetic values are used to select those to breed. Estimation of breeding values requires heavy experimental means and is time consuming. Phenomic prediction is a low cost and high throughput genetic value estimation method that is increasingly being used. It often uses near infrared spectroscopy measurements as predictors of genetic values that are easy to collect and thus routinely used in many species. However, near infrared spectra generally require pre-processing before being used in prediction. Currently used pre-processing methods arise from the chemometrics community, and still deserve a better in-depth appropriation by geneticists. In this study, we propose a new pre-processing approach that performs as good as or better than the best chemometric pre-processing generally used, reduces computation time, and allows for a better understanding of what parts of spectral information are relevant for prediction. Core IdeasO_LIWorking on principal components of spectra instead of wavelengths increases predictive ability of phenomic prediction and performs as good as or better than classical chemometrics pre-processing C_LIO_LIWorking on principal components of spectra requires less optimization of parameters than chemometrics pre-processing C_LIO_LIAbout 1% of spectral variance is responsible for most of the predictive power of phenomic prediction C_LIO_LIWorking on principal components of spectra pre-processed with classical chemometrics pre-processing can increase predictive ability even more C_LIO_LIPCA-based methods are valuable to optimize predictive ability of phenomic prediction and could be used more widely in the quantitative genetics field C_LI

14
DeepBioGS: a hybrid framework for integrating crop growth modelling with genomic prediction through neural networks

Jighly, A.; Joukhadar, R.; Trethowan, R.; Daetwyler, H.; Spangenberg, G.

2026-05-21 plant biology 10.64898/2026.05.11.724249 medRxiv
Top 0.1%
1.7%
Show abstract

Ensuring global food security under rapid climate change demands accelerated genetic gain and breeding strategies that address complex Genotype-by-Environment (GxE) interactions. Traditional genomic selection models often fail to account for novel or extreme climates.Furthermore, integrating mechanistic crop growth models (CGMs) using traditional Bayesian frameworks to solve this issue presents severe computational bottlenecks. Here, we introduce DeepBioGS, a novel hybrid framework that integrates genomic selection with biophysical growth modelling via a fully differentiable deep learning architecture. DeepBioGS utilises a parameter-prediction multi-layer perceptron to map high-dimensional genomic markers to latent, highly heritable physiological traits (Genotype-Specific Parameters; GSP). These parameters mechanistically predict crop phenology across diverse environments. Using two multi-environment wheat datasets comprising over 6,000 genotypes, DeepBioGS extracted latent traits with near-perfect SNP-based heritability values (0.95-1.00). Crucially, the framework demonstrated superior or comparable predictive accuracy (up to r2 = 0.77) against standard genomic best linear unbiased prediction (GBLUP) and traditional Bayesian CGM-WGP models. Its architecture drastically improved computational scalability by enabling standard backpropagation, effectively bypassing the stochastic sampling limitations of approximate Bayesian methods. Most importantly for climate adaptation, DeepBioGS allowed accurate forecasting of genotype performance in entirely unobserved environmental conditions. By merging the representational power of deep learning with the structural constraints of biophysics, DeepBioGS provides a highly scalable, interpretable tool to navigate GxE interactions, enabling the assessment of cultivars under future climate scenarios, thus optimising crop breeding for a changing global environment.

15
A decade of disease survey data in a progeny-provenance trial: Dothistroma needle blight in Scots pine

Perry, A.; Moore, B.; Jones, S.; Kaur, S.; Crampton, B.; Gurung, A.; Stockan, J. A.; Cottrell, J. E.; Beaton, J. K.; Cavers, S.

2026-05-14 ecology 10.64898/2026.05.12.724484 medRxiv
Top 0.1%
1.7%
Show abstract

Longitudinal data on disease susceptibility in forest trees are rare but essential for understanding host-pathogen dynamics and genetic variation in susceptibility traits. We present a long-term multisite common garden dataset quantifying susceptibility of Scots pine (Pinus sylvestris) to Dothistroma needle blight. The dataset comprises annual disease assessments collected from the same trees across 11 years, spanning 168 families and 21 Scottish provenances. This design enables partitioning of genetic and environmental sources of variation, evaluation of temporal stability in host response, and estimation of variance components and narrow-sense heritability of susceptibility. The data support analyses of phenotypic plasticity, provenance-level responses, and interactions between disease susceptibility and other adaptive traits. This resource will facilitate predictive modelling of host susceptibility under current and future environmental conditions.

16
The stability of fatty acid composition in sunflower oil is dependent on environment and affected by structural variation

Ingold, M.; Gao, Q.; Mandel, J. R.; McNellie, J. P.; Keepers, K. G.; Barb, J. G.; Burke, J. M.; Rieseberg, L. H.; Hulke, B. S.

2026-05-07 plant biology 10.64898/2026.05.04.722759 medRxiv
Top 0.2%
1.5%
Show abstract

In sunflower (Helianthus annuus L.), the composition of fatty acids in the seeds, primarily oleic, linoleic, stearic and palmitic acid, is of utmost importance for oil quality. Despite this, the genetic basis of this trait and its interaction with the environment is poorly understood. Understanding this interaction is critical to improvement of sunflower within the context of climate change. In this work, we incorporated fatty acid composition measurements from the sunflower SAM population and eight environments across an extensive geographic cline into GWAS. The SAM panel consists of 287 varieties representing approximately 90% of sunflower diversity, for which 2.2 million high-quality SNPs with a MAF > 5% are available. For increased power, multivariate GWAS was performed with four different inputs: (i) mean fatty acid composition within each environment, (ii) mean fatty acid composition within each environment omitting high oleic varieties, (iii) trait stability within environments quantified by standard errors among replicate samples ( stability) and (iv) Eberhart and Russells {beta} which quantifies trait stabilities across environments ({beta} stability). All four analyses yielded highly significantly associated SNPs. We found that high oleic varieties exhibited high {beta} trait stability, resulting in substantial overlap in markers between analyses (i) and (iv), with signals being fairly consistent between environments in analysis (i). For analyses (ii) and (iii), significant markers tended to vary between trials. For significant SNPs across all analyses, 147 candidate genes were identified, including promising candidates such as 15 fatty acid metabolism genes, 6 heat shock proteins and 22 transcription factors. Lastly, a large introgression consisting of two flanking inverted sequences on Chromosome 5 was found to coincide with stability in the Georgia trial, suggesting a role in FA composition stability under high heat conditions.

17
Efficient Optimization of Genotype Pairs for Intercropping using Genomic Prediction and Bayesian Optimization

Kinoshita, S.; Iwata, H.

2026-05-18 genomics 10.64898/2026.05.15.725387 medRxiv
Top 0.2%
1.2%
Show abstract

Intercropping is a promising strategy to improve productivity and sustainability in agricultural systems, but designing effective genotype combinations remains a major challenge owing to the rapid increase in possible pairings as the number of candidate genotypes increases. This creates a practical bottleneck because field evaluation of all combinations is infeasible under realistic resource constraints. Here, we propose a framework that integrates genomic prediction and Bayesian optimization to support efficient decision-making for intercropping system design. Using genome-wide marker data from sorghum and soybean, we simulated intercropping performance across 5,214 genotype pairs under certain genetic architectures, including variation in heritability, correlations between direct and indirect genetic effects, and the contribution of pair-specific interactions. Genomic prediction models incorporating direct and indirect genetic effects substantially improved prediction accuracy compared with models based on direct genetic effects alone, and inclusion of specific mixing ability further enhanced the performance under high-heritability conditions. When coupled with Bayesian optimization, the models rapidly identified superior genotype pairs, requiring fewer evaluation cycles than random or prediction-only search strategies. Acquisition functions that account for predicted uncertainty were most effective in complex scenarios involving interaction effects or negative correlations between direct and indirect effects. These results demonstrate that combining genomic prediction with Bayesian optimization can substantially reduce the experimental burden associated with intercropping design, while improving the efficiency of identifying high-performing genotype pairs. The proposed framework provides a practical approach for prioritizing candidate mixtures in breeding and field evaluation, and contributes to the development of data-driven strategies for sustainable agricultural systems. HighlightsO_LIA data-driven framework was developed to optimize genotype pairs in intercropping. C_LIO_LIModeling indirect effects improved prediction accuracy across genotype pairs. C_LIO_LIPair-specific interactions enhanced prediction under high-heritability conditions. C_LIO_LIBayesian optimization identified superior pairs under limited evaluation capacity. C_LIO_LIThe framework reduces field-testing requirements for intercropping system design. C_LI

18
Reduction of Pollen Number and Anther Length in Bread Wheat Studied by a Nested Association Mapping Population

Hamaya, N.-B.; Kakui, H.; Okada, M.; Jilu, N.; Jung, K.; Nitta, M.; Wicker, T.; Keller, B.; Nasuda, S.; Shimizu, K. K.

2026-05-23 plant biology 10.64898/2026.05.22.727104 medRxiv
Top 0.2%
1.2%
Show abstract

The number of pollen grains, which carry male gametes in seed plants, has attracted interest in genetics, evolution, and breeding. Rapid evolutionary reductions in pollen number and anther length were reported in selfing species as well as domesticated species, although this poses a challenge for hybrid breeding. Here, we studied the variation of pollen number and anther length of the hexaploid bread wheat (Triticum aestivum) by employing a quick pollen counting method. Pollen numbers in cultivars were lower than those in landraces among 54 lines of diverse geographic origins. Using the year of registration of traditional and modern cultivars, we found a reduction in pollen number over the past 150 years. We detected high heritability and variation among Asian landraces and cultivars. Thus, we conducted QTL mapping of pollen number as well as of anther length using nested association mapping lines in which Norin 61 was the common parent. Genomic loci encompassing Green Revolution genes (Rht-B1, Rht-D1, and Ppd-D1) showed significant effects on pollen number and anther length, but their contributions were relatively minor. Although anther length has often been used as a proxy for pollen number in bread wheat, our data showed that their correlations are not necessarily high. Interestingly, we identified a new QTL of pollen number that was not detected by measuring anther length, and, vice versa, a new QTL specific to anther length. These data suggest that pollen number has reduced rapidly in bread wheat but can be modified using the genetic diversity of landraces. Significance statementWe found that modern cultivars of bread wheat have reduced pollen number and shorter anther length, which are common in domesticated species but can be a challenge for hybrid breeding. Using underutilized Asian landraces and cultivars, we reported that new quantitative trait loci as well as loci used in the Green Revolution, are responsible for the traits, which can be employed to increase pollen numbers.

19
Temporal changes in allele frequency facilitate detection of adaptive variants in winter wheat (Triticum aestivum L.) breeding programs

Johansen, N. H.; Sarup, P.; Hansen, P.; Orabi, J.; Jahoor, A.; Ramstein, G. P.

2026-05-04 genetics 10.64898/2026.04.30.721918 medRxiv
Top 0.2%
1.1%
Show abstract

In quantitative genetics, candidate SNPs are identified through genotype-phenotype associations inferred with genome-wide association studies (GWAS). In this study, we explore an alternative approach to detect genetic variants with non-neutral effects by tracking temporal trends in allele frequency in a winter wheat (Triticum aestivum L.) breeding population over an eight-year period, from which signals of selection may be inferred. Selection signatures were inferred with a generalized linear model, where we modeled trends in allele frequency as a function of time (crossing year). These signatures of selection were used to prioritize variants. Associations between phenotypic performance and individual load of prioritized variants were then investigated. Furthermore, we assessed whether incorporating selection information into a genomic best linear unbiased prediction (GBLUP) model improves model performance in terms of quality of fit and prediction ability. Our findings indicate that the inferred signals of selection are effective in identifying non-neutral variants. Variants under strong negative selection were associated with a decrease in protein content adjusted for grain yield (p-value < 0.01), while genetic variants that had been under moderate to high levels of positive selection were associated with increased grain yield (p-value < 0.01). However, incorporating selection information did not improve prediction accuracy. In conclusion, temporal trends in allele frequency can be used to detect non-neutral variants. The proposed approach may hence complement traditional quantitative genetic methods for detecting non-neutral genetic variation. This approach may allow breeders to detect non-neutral variants earlier in the breeding cycle, without resorting to phenotypic data.

20
A weighted multi-trait approach for heterotic grouping of maize inbred lines under Striga infestation and optimum environments

Abubakar, A. M.; Adejumobi, I. I.; Mengesha, W. A.; Meseka, S.; Oyekunle, M.; Ado, S. G.; Bonkoungou, T. O.; Badu-Apraku, B. A.; Derera, J.

2026-05-16 genetics 10.64898/2026.05.15.725596 medRxiv
Top 0.2%
1.0%
Show abstract

Maximum utilization of existing genetic variability in a breeding program depends on the efficient classification of the inbred lines into heterotic groups, particularly under stress conditions. This study applied practical breeding approaches to determine the mode of genetic inheritance for Striga resistance and proposes a weighted heterotic grouping method based on the general combining ability of multiple traits (WHGCAMT) and compares its effectiveness with other existing methods in classifying the inbred lines into heterotic groups in Striga-infested and optimum environments. Using Diallel design IV, 300 crosses were generated from 21 inbred lines and 4 standard testers. The crosses, along with six checks, were evaluated in an 18 x 17 alpha lattice design with two replications at two locations, in both artificial Striga-infested and Striga-free environments. The inbred lines were genotyped using DArTtag SNP markers. Phenotypic and genotypic data were analyzed using R. Analysis of variance revealed significant mean squares for hybrid, general combining ability (GCA), specific combining ability (SCA) and their interactions with environment. Significant positive and negative GCA and SCA effects were detected for grain yield and other measured traits. However, a larger proportion of additive gene action than non-additive gene action was observed for grain yield and most measured traits. The analysis of molecular variance also showed substantial genetic differences within and between clusters. Except for HSCA, the mean grain yield between the inter-group and intra-group hybrids was significant for each method. Pairwise comparison of the inter- and intra-group hybrids of all the methods showed significant differences between the WHGCAMT and all other methods in most cases. WHGCAMT consistently produced higher-yielding inter-group hybrids and lower-yielding intra-group hybrids, achieving breeding efficiency improvements of 55.8%, 4.3%, 15.7%, and 11.4% over the HSCA, HSGCA, HGCAMT and molecular marker methods, respectively, under Striga infestation. Thus, WHGCAMT offers more precise, reliable and biologically meaningful heterotic groups among early-maturing maize inbred lines.